Students will acquire the knowledge to conduct 
statistical analysis on a variety of data sets using a 
wide range of modern computerized methods. The 
students will learn how to recognize which tools 
are needed to analyze different types of datasets, 
how to apply these tools in each case, and how to 
employ diagnostics to assess the quality of their 
results. They will learn about statistical models, their 
complexity and their relative benefits depending 
on the available data. Some of the tools that the 
students will come to learn well include linear simple 
and multiple regression, nearest neighbors methods,
shrinkage methods (ridge, lasso), dimension 
reduction methods (principal components), logistic 
regression, linear discriminant analysis, tree-based 
methods, model selection algorithms with criterion 
or by resampling techniques and clustering. 
The focus of the course will be less on theory 
and more on providing the students with as much 
intuition as possible and acquainting them with as 
many methods as possible. The course will make 
substantial use of the R statistical programming 
language and its libraries. 
Outcome: Not Provided